Design and statistical analysis reporting among interrupted time series studies in drug utilization research: a cross-sectional survey

Introduction Interrupted time series (ITS) design is a commonly used method for evaluating large-scale interventions in clinical practice or public health. However, improperly using this method can lead to biased results. Objective To investigate design and statistical analysis characteristics of drug utilization studies using ITS design, and give recommendations for improvements. Methods A literature search was conducted based on PubMed from January 2021 to December 2021. We included original articles that used ITS design to investigate drug utilization without restriction on study population or outcome types. A structured, pilot-tested questionnaire was developed to extract information regarding study characteristics and details about design and statistical analysis. Results We included 153 eligible studies. Among those, 28.1% (43/153) clearly explained the rationale for using the ITS design and 13.7% (21/153) clarified the rationale of using the specified ITS model structure. One hundred and forty-nine studies used aggregated data to do ITS analysis, and 20.8% (31/149) clarified the rationale for the number of time points. The consideration of autocorrelation, non-stationary and seasonality was often lacking among those studies, and only 14 studies mentioned all of three methodological issues. Missing data was mentioned in 31 studies. Only 39.22% (60/153) reported the regression models, while 15 studies gave the incorrect interpretation of level change due to time parameterization. Time-varying participant characteristics were considered in 24 studies. In 97 studies containing hierarchical data, 23 studies clarified the heterogeneity among clusters and used statistical methods to address this issue. Conclusion The quality of design and statistical analyses in ITS studies for drug utilization remains unsatisfactory. Three emerging methodological issues warranted particular attention, including incorrect interpretation of level change due to time parameterization, time-varying participant characteristics and hierarchical data analysis. We offered specific recommendations about the design, analysis and reporting of the ITS study. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-024-02184-8.


Introduction
Drug utilization research has received substantial attention from health researchers and policymakers in recent years.Interventions in drug utilization research may range from clinical guideline publications to drug programmes or policies.The randomized controlled trial is considered as the gold standard design for evaluating the causal effect of an intervention.Nevertheless, it is not always feasible or ethical in the field, as these interventions are often targeted at population level [1][2][3][4].As a strong quasi-experimental design, interrupted time series (ITS) design has increasingly been used for the evaluation of drug utilization interventions by comparing the level and trend of outcomes after intervention with the pre-intervention underlying level and trend [5][6][7][8][9][10].
Several important methodological issues need to be considered when conducting ITS studies, such as time period selection, sample size, missing data, autocorrelation, and non-stationary and seasonality, which have been described in previous tutorials [5,[11][12][13][14][15].Three issues newly emerging in recent years also require additional methodological considerations.First, the correct setting and interpretation of the ITS regression model should be underlined.In a particular ITS model setting including parameters of level change and slope change, β2 represents the immediate level (or intercept) change immediately following the intervention [16].However, some current peer-reviewed studies conducted the wrong ITS model but still described β2 as the level change at the time of interruption, which will lead to an erroneous and biased result for the main effect of the immediate level change of the time-series (see details in Appendix 1) [17].Second, it is possible that the participants' characteristics are not constant at different time points.The ITS method might be affected by time-varying confounding, which may result in a misleading finding [5,6].Third, heterogeneity in clusters should be appropriately addressed in ITS studies if the dataset contains a hierarchical structure and has within and/or between cluster heterogeneity [18][19][20].A study pointed out that authors need to consider this issue and use appropriate analysis methods such as mixed-effect model [19].However, since not all articles contain multiple-level data, the proportion of studies that have not yet addressed this issue remains unclear.
Previous studies might not have comprehensively addressed these methodological issues [11,[21][22][23][24][25].The last survey on the ITS studies in drug utilization research was published in 2015 and did not cover the new methodological issues mentioned above [21].Additionally, despite increasing tutorials for conducting ITS have been published in recent years, it is still unclear whether the quality of current ITS studies in drug utilization has improved.Thus, we conducted a cross-sectional survey among the published ITS studies in drug utilization, aiming to identify the potential methodological gaps and give suggestions for improvement.

Eligibility criteria
We included empirical studies that used ITS design and focused on intervention related to drug utilization, with no limitation to study population or types of outcomes.The definition of ITS study was followed as the previous methodological studies, as "a time series of a particular outcome of interest is used to establish an underlying trend, which is 'interrupted' by an intervention at a known point in time" [4,5].We focused on the ITS study about drug utilization, whose intervention was about various medical, social and economic aspects of drug use [26].
Studies meeting any of the following criteria were excluded: (1) letters, commentaries, study protocols, conference abstracts, systematic reviews, meta-analyses, randomized controlled studies; (2) not written in English; (3) a methodological paper with an ITS example; (4) ITS analysis was not the main result.

Search strategy
We searched PubMed in January 2022 for ITS studies published in 2021.We used Mesh terms and text words correlated to interrupted time series to develop the search strategy, including "interrupted time series", "change point", "segmented regression", "repeated measures study" and so on.The details of the search strategy are presented in Appendix 2.

Study process
A structured, pilot-tested checklist was developed to screen titles, abstracts, and full texts for potentially eligible studies, using prespecified eligibility criteria.Two researchers (YZ and YH), who were trained in epidemiology and biostatistics with sufficient experiences in ITS analysis, conducted the records screening independently.Any disagreements were resolved by the discussion and adjudication by a third reviewer (YR).
Before informally extracting the data, two researchers (YZ and YH) randomly selected 15 (10%) eligible studies and extracted the data independently.They checked for consistency, and any disagreements were adjudicated by a third reviewer (YR).In total, the agreement between the two researchers (YZ and YH) was above 95%.Then a single researcher (YZ) extracted the further 138 studies.

Development of the data extraction form
A structured questionnaire was developed to investigate the design and analysis characteristics of ITS studies on drug utilization research.Initially, we reviewed the published methodological literature and statements to design the initial data extraction form [5,[21][22][23][24][25]. Then, we invited four experts (XS, RY, JT and MY) in clinical epidemiology and biostatistics to review and discuss the data extraction form, assessing the relevance and applicability of candidate items.We randomly selected 30 studies as pilot extraction to check if there were any items inappropriate.
Finally, we identified three parts of the design and analysis characteristics of ITS studies, including (1) general characteristics, (2) design, and (3) statistical analysis.The detailed items of the data extraction form are shown in Appendix 3.

Data analysis
All items in the data extraction form were summarized using descriptive statistics.For categorical variables, we presented frequencies and percentages; for continuous variables, we presented mean with standard deviation (SD) or median with interquartile range (IQR).All statistical analyses were conducted using Stata.15.1.

Results
Through the search from PubMed, we identified 1862 records.After reviewing titles, abstracts and full texts, 153 studies were finally included in our analysis (Fig. 1).Appendix 4 shows the details of all included studies.

Study design Rationale for ITS design
Among the included studies, only 28.1% (43/153) reported the rationale for using ITS design (Table 2).All studies gave a clear segment time of the intervention.There were 12.4% (19/153) studies that used ITS design with the control group to strengthen the validity of the study design.
For the 149 studies with the aggregated unit, the most common time interval was monthly (73.8%, 110/149).The median (IQR) of total time points was 48 (30,72).Only 20.8% (31/149) clarified the rationale for the number of time points (sample size calculation).

Basic statistical analysis characteristics
Various statistical methods were used to analyze the ITS studies (Table 3).Among the total 153 studies, OLS (30.7%, 47/153) and ARIMA (15.7%, 24/153) were the two most popular methods of the ITS regression model.

Table 1 General characteristics of included ITS studies (N=153)
a Administrative medical databases are massive repositories of data collected in healthcare for various purposes.Such databases are maintained in hospitals, health maintenance organizations and health insurance organizations.In this article, other administrative databases in this article included the administrative database which was not from hospital or health insurance organizations b Others included registry (6 studies), survey

Basic methodological considerations (Autocorrelation, non-stationary and seasonality)
Among the 149 studies with aggregated-level outcome and time series data, 14 studies considered all of the three methodological issues of time series data (Table 4).117 studies considered at least one of three methodological issues.Specifically, autocorrelation was acknowledged in 108 studies, non-stationarity was acknowledged in 20 studies, and seasonality was acknowledged in 60 studies.Among the studies adjusted for autocorrelation, nonstationary and seasonality, 25.0% (27/108), 5% (1/20) and 16.7% (10/60) respectively failed to specify the methods they used.

Incorrect interpretation of level change due to time parameterization
Of the 153 studies, only 39.2% (60/153) reported the specific regression model and interpreted the coefficients in the article or supplementary files (Table 5).
Moreover, we found that 15 studies gave incorrect interpretations of level change due to time parameterisation.To be more specified, these studies reported the model as " Y t = β 0 + β 1 T t + β 2 X t + β 3 T t • X t ", which included parameters for level change and slope change.But they described β 2 as "level change at the time of interruption.As we discussed in Appendix 1, this will lead to an incorrect result for the effect of the immediate level change if the study used this model for statistical analysis.

Findings and interpretations
This study provides updated evidence on the quality of ITS studies and found that most ITS studies in drug utilization fail to consider the methodological issues of design and statistical analysis comprehensively.
a Location: use a different area as control; Outcome: use an outcome not affected by the intervention as control; Characteristic: use a group not targeted by an intervention as control; Historical: compare a previous group to a current group b Aggregated data refer to summary statistics (e.g., mean, percentage, median) calculated across individual data c We reported the median and interquartile range (25% and 75%) d Delay: where the delay was acknowledged and included in pre-or post-interruption segment; Excluded: where a separate segment was used for the delay time period, but this was excluded from the analysis; Segment: where a separate segment was used for the delay time period, and this was included in analysis; Sensitivity: where the delay was modelled as part of a sensitivity analysis, but ignored in main analysis) e Some studies used more than one method to deal with delay Three main issues of ITS study design need to be considered.First, most studies did not give the rationale for using ITS design.Although it is an appropriate method when randomization is not feasible, the basic ITS design may be affected by confounding due to co-interventions or other events occurring around the study period [27,28].Thus, we recommended that the author should give the rationale for using ITS design, such as for ethical consideration or no adequate control group.Second, most of the studies did not report the consideration of the study period, time interval and sample size.The selection of time period should be a balance between statistical requirements and research problem-driven decisions [29,30].A simulation study found that sample size per time point had a large impact on power in ITS study.Even though the studies meet the requirement of minimum time points, most analyses were underpowered if the sample size per time point was low [30].Therefore, the author should balance the number of time points and the sample size per time point.Meanwhile, if the period is too short, there may be too little data to model the trend.However, if the period is too long, it may be affected by historical bias.Third, most of the studies used the ITS model structure with both level change and slope change.However, only a few studies analyzed the intervention (whether it will lead to immediate change or sustained change) and chose the ITS model structure to fit it well [5,29].When the model was misspecified, the results of ITS were not robust anymore [31].
Meanwhile, we found five issues that may affect the quality of statistical analysis.First, most of the studies did not mention the missing data.A study mentioned that most of the study used data aggregated at the population level, but it will lead to bias when data are missing at random at the individual level [11].In a simulated scenario in this study, if the outcome is missing at random for male but is fully observed for female, the aggregated data will show a wrong seasonal pattern.Second, the considerations of autocorrelation, non-stationary and seasonality are still poor among current ITS studies.Ignoring the characteristics of time series data may not provide robust results [5].Third, more than half studies did not report the regression model, which might lead to an unclear understanding of statistical methods for readers.Moreover, for the studies that reported the regression model, 15 studies used the setting " TX t " instead of " (T − T 0 )X t " in the ITS regression model.But it will lead to an erroneous result for the main effects of the level change.Fourth, the consideration of time-varying confounding is lacking.Participants-level confounding should be considered and controlled if the population was changed at each time point [6,19].Fifth, most of the included studies ignored the hierarchical data structure and aggregated the outcome to the population level, even if they had the opportunity to aggregate the outcome at a lower level.As we discussed above, when the intervention is implemented regionwide or nationwide, the dataset may contain a hierarchical structure.If the outcome is aggregated at a higher level, which does not account for the heterogeneity among patients and across hospitals, it will lead to aggregation bias [18,19,25,32].

Comparison with other studies
Several studies have systematically reviewed methodological issues regarding the design and statistical analysis of the ITS study [11,[21][22][23][24][25].All of the previous reviews pointed out that the considerations of autocorrelation, non-stationary and seasonality were limited, which was aligned with our study.Five of them reported the sample size considerations which focused on the minimization of data points, while our study also pointed out that the maximum of data points should also be a consideration.Some methodological issues have been improved among the ITS studies published in 2021.For example, for the item "clearly segment time", the reported proportion has seen a notable increase, rising from 84.5% (as observed in Jandoc et al. 's review) to 100% in our study.However, some issues remain a concern (e.g., sample size, missing data, incorrect interpretation of level change due to time parameterization, time-varying participants-level confounding, and data hierarchical structure).A previous review that included a meta-analysis and re-analysis of ITS studies found that 5% (2/41) of studies did not report the statistical method used [33].In our review, this proportion is 15.0% (23/153), indicating a higher proportion of inadequate reporting in original articles.
Our study gave a detailed analysis of three ever ignored but important methodological issues, including common errors in parameter interpretation of ITS models, limited consideration of individual-level characteristics and poor handling of heterogeneous data among clusters.Although a methodology study published in 2020 pointed out this problem and a corrigendum to the original tutorial had been made [17,34], this was still a common mistake in ITS empirical studies published in 2021.Individual-level characteristic is also an important issue.If patient characteristics vary over time, it is essential to control for these changes using appropriate methods.For the potential cluster effects, our result showed that most of the studies had the opportunity to control the potential heterogeneity from different clusters, but few of them considered it.

Strengths and limitations
This study gives a comprehensive survey of the methodological issues in the design and statistical analysis of ITS studies in drug utilization.To the best of our knowledge, this is the first cross-sectional survey that exclusively assesses the incorrect interpretation of level change due to time parameterization, time-varying individual-level covariates and handling of hierarchical data in current  a If the researchers set an ITS model with both level change and slope change, and used the product between their calendar time variable and the indicator variable indicating pre-versus post-intervention time periods to represent the post-intervention linear segment, then the interpretation was wrong (More details in Appendix 1) b For this item, we did not calculate the proportion as the denominator is difficult to define.We believe that using either 60 (the number of studies reporting regression models) or 139 (the number of models including level change and slope change) as the denominator would be inappropriate c Some studies used more than one method to control individual-level characteristics d This part only included ITS studies with aggregated analysis units (n = 149) because the mishandling of data hierarchy only takes place in the ITS study with aggregated analysis unit e For the studies that contained individual-level data, we calculated how many levels are there in the dataset excluded individual data (which cannot be repeated measured).For example, the raw data was a three-level hierarchy of patient, hospital and region and the repeated measured level were hospital and region.We defined this dataset as a two-level hierarchical data for ITS analysis.For the studies that only contained aggregated data, we calculated how many levels are there in the dataset directly  ITS studies, which have been highlighted in the methodological literature.Meanwhile, we updated the current practices of ITS in the field of drug utilization research.ITS is a frequently used method in evaluating a population-level intervention, and there is a series of literature on methodological considerations published over the past few years.It is worth analyzing and showing the limitations in methodological issues of ITS practices.
There are also three limitations in our studies.First, we only included the ITS studies published in 2021 and used a single database for searching.However, since PubMed contains nearly all healthcare science & service and public health research journals, we think that it can represent the current practices sufficiently.Another limitation is that we assess the design and statistical characteristics through the reporting of the article.If the reporting of these aspects is insufficient, we cannot determine the items and the results may be inaccurate.Third, some items may not be relevant to all studies.For example, in ITS studies using aggregated data, authors might not be able to assess the proportion of missing data at the individual level.Consequently, they may not report missing data in their articles.

Conclusion
In summary, we identified a series of deficiencies in design and statistical analysis among current ITS studies, showing that the basic methodological issues are not improved, and some new issues are not widely considered (i.e., incorrect interpretation of level change due to time parameterization, time-varying individual characteristics and hierarchical data structure).Although a series of methodology reviews and tutorials mentioned the important issues in ITS design, there is still a significant gap between guidelines and practices of ITS studies in drug utilization research, accentuating that it is need to develop a clearer guide and checklist for conducting ITS study.

Fig. 1
Fig. 1 Flow diagram of the selection results (3 studies), statistical yearbook (2 studies) and cohort study (1 study) c Others included drug shortage (1 study) and change drug packaging (1 study) d Others included Europe (4 studies), city (1 study) and community (1 study) e Others included cannabis-related criminal offences (1 study) Where did the author report the regression model and the interpretation of coefficients?(N = 60) How to control individual-level characteristics (N = 24) c

Table 2
Design characteristics (rationale for ITS, data handling and model structure) of included studies

Table 3
Basic statistical characteristics of included ITS studies a This item refers to the statistical method for the main results in a study b Others included fixed effect model (1 study), negative binomial model (1 study), quasi-poisson model (1 study) and linear probability model (1 study) c Transition period: change the interrupted time or time period in the regression model; Change ITS model setting: change the ITS impact model (e.g., from both level and slope change to only level change) Some studies used more than one method for sensitivity analysis

Table 4
Characteristics of the basic methodological considerations (autocorrelation, non-stationarity, seasonality) (Only for ITS with aggregated unit)

Table 5
Additional methodological considerations (parameters setting, individual-level covariates and hierarchical data structure) for ITS studies